Exploring dependence between categorical variables: Benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms

نویسندگان

  • Michail Papathomas
  • Sylvia Richardson
چکیده

This manuscript is concerned with relating two approaches that can be used to explore complex dependence structures between categorical variables, namely Bayesian partitioning of the covariate space incorporating a variable selection procedure that highlights the covariates that drive the clustering, and log-linear modelling with interaction terms. We derive theoretical results on this relation and discuss if they can be employed to assist log-linear model determination, demonstrating advantages and limitations with simulated and real data sets. The main advantage concerns sparse contingency tables. Inferences from clustering can potentially reduce the number of covariates considered and, subsequently, the number of competing log-linear models, making the exploration of the model space feasible. Variable selection within clustering can inform on marginal independence in general, thus allowing for a more efficient exploration of the log-linear model space. However, we show that the clustering structure is not informative on the existence of interactions in a consistent manner. This work is of interest to those who utilize log-linear models, as well as practitioners such as epidemiologists that use clustering models to reduce the dimensionality in the data and to reveal interesting patterns on how covariates combine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determinants of Inflation in Selected Countries

This paper focuses on developing models to study influential factors on the inflation rate for a panel of available countries in the World Bank data base during 2008-2012‎. ‎For this purpose‎, Random effect log-linear and Ordinal logistic models are used for the analysis of continuous and categorical inflation rate variables‎. ‎As the original inflation rate response to variables shows an appar...

متن کامل

Inflation Behavior in Top Sukuk Issuing Countries: Using a Bayesian Log-linear Model

This paper focused on developing a model to study the effect of sukuk issuance on the inflation rate in top sukuk issuing Islamic economies at 2014‎. ‎For this purpose‎, ‎as the available sample size is small‎, ‎a Bayesian approach to regression model is used which contains key supply and demand side factors in addition to the outstanding sukuk volume as potential determinants of inflation rate...

متن کامل

Gender-based Differences in Associations between Attitude and Self-esteem with Smoking Behavior among Adolescents: A Secondary Analysis Applying Bayesian Nonparametric Functional Latent Variable Model

Background: Different patterns of gender-based relationships between attitude toward smoking and self-esteem with smoking behavior have reported. However, such associations may be much more complex than a simply supposed linear relationship. We aimed to propose a method of providing hand details on the total and gender-based scenarios of the relationships between attitude toward smoking and sel...

متن کامل

Bayesian Inference for Poisson and Multinomial Log-linear Models

Categorical data frequently arise in applications in the social sciences. In such applications,the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomi...

متن کامل

The Regression Modelling of Kinanthropometric and Kinematic Variables in Relation to Ball Velocity of Nigerian Female Tennis Players

Background. Kinanthropometric and Kinematic variable are important in the understanding of performance in various sports. These variables have not been explored among Tennis players in Nigeria. Objectives. An exploration to establish a regression model for kinanthropometric and kinematic variables in relation to ball velocity of Nigerian Female Tennis Players. Methods. Data were collected thr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 173  شماره 

صفحات  -

تاریخ انتشار 2016